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TECHNIQUE FOR DETERMINING AN OPTIMAL NUMBER OF TASKS IN A 
PARALLEL DATABASE LOADING SYSTEM WITH MEMORY CONSTRAINTS 

BACKGROUND OF THE INVENTION 

5 1. Field of the Invention 

This invention relates in general to computer-implemented database systems, and, in 
particular, to a technique for optimizing a number of tasks to be executed in a parallel database 
loading system. 

2. Description of Related Art 

10 Databases are computerized information storage and retrieval systems. A Relational 

Database Management System (RDBMS) is a database management system (DBMS) that uses 
relational techniques for storing and retrieving data. Relational databases are organized into 
tables which consist of rows and columns of data. The rows are formally called tuples or 
records. A database will typically have many tables and each table will typically have multiple 

15 tuples and multiple columns. The tables are typically stored on direct access storage devices 
(DASD), such as magnetic or optical disk drives for semi-permanent storage. 

A table can be divided into partitions, with each partition containing a portion of the 
table's data. Each partition may reside on a different data storage device. By partitioning tables, 
the speed and efficiency of data access can be improved. For example, partitions containing 

20 more frequently used data can be placed on faster data storage devices, and parallel processing 
of data can be improved by spreading partitions over different DASD volumes, with each I/O 
stream on a separate channel path. Partitioning also promotes high data availability, enabling 
application and utility activities to progress in parallel on different partitions of data. 

In an attempt to speed up the loading of data, various approaches have been tried 

25 involving the use of parallel processing. Parallel processing exploits the multiprocessor 
capabilities of modern high speed computers and refers to the use of several processors to load 
data into different parts of the database in parallel with each other. That is, data is loaded into 
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different partitions of a database by load utilities that are executing concurrently. In particular, 
the data to be loaded into the database is sorted and then separated into multiple input files. 
Then, a load utility may load data into a tablespace (i.e., read data from an input file and store 
the data in a tablespace). 

5 However, conventional techniques for loading databases in parallel do not fully utilize 

the resources of the hosting parallel database loading system. Because parallel load processes 
only operate on a single partition and a single input file at a time, the processing capabilities of 
the particular host system need to be taken into account in order to optimize the speed at which 
databases are loaded. Past loading techniques fail to optimize the tasks required to load a 

10 database, and as such, unnecessarily slow processing times result. 

Therefore, there is a need in the art for an improved technique for optimizing a number 
of tasks to be invoked simultaneously in a parallel database loading system. 

SUMMARY OF THE INVENTION 
To overcome the limitations in the prior art described above, and to overcome other 
1 5 limitations that will become apparent upon reading and understanding the present specification, 
the present invention discloses a method, apparatus, and article of manufacture for a computer 
implemented technique for optimizing a number of tasks to be executed in a parallel database 
loading system. 

In accordance with the present invention, the number of parallel load operations to load 
20 data into a data store connected to a computer is optimized by identifying the memory 
constraints of the computer environment employed, identifying the available processing 
capabilities of the computer system, and determining a number of load and sort processes to be 
started in parallel based on the identified memory constraints and processing capabilities. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Referring now to the drawings in which like reference numbers represent corresponding 
parts throughout: 

FIG. 1 is a diagram illustrating a computer hardware system that could be employed in 
5 accordance with the present invention; 

FIG. 2 is a detailed diagram of the computer hardware system of FIG. 1 ; 

FIG. 3 is a diagram illustrating loading, partitioning, sorting, building, and indexing of 
data in the hardware system of FIG. 2; 

FIG. 4 is a flow-chart illustrating the process flow for the system of FIG. 1; 
1 0 FIG. 5 A is a graph illustrating a first relationship of constraints for the system of FIG.2; 

FIG. 5B is a flowchart illustrating the process flow for the relationship of FIG. 5 A; 

FIG. 6A is a graph illustrating a second relationship of constraints for the system of 

FIG.2; 

FIG. 6B is a flowchart illustrating the process flow for the relationship of FIG. 6A; 
1 5 FIG. 7A is a graph illustrating a third relationship of constraints for the system of FIG.2; 

and 

FIG. 7B is a flowchart illustrating the process flow for the relationship of FIG. 7 A. 

DETAILED DESCRIPTION 
20 In the following description of an embodiment of the invention, reference is made to 

the accompanying drawings which form a part hereof, and which is shown by way of 
illustration a specific embodiment in which the invention may be practiced. It is to be 
understood that other embodiments may be utilized as structural changes may be made 
without departing from the scope of the present invention. 

25 

Hardware Environment 

FIG. 1 is a diagram illustrating a computer hardware environment that could be used 
in accordance with the present invention. A computer server 100 comprises an optimization 
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system 102 and multiple processors (Processor- 1 104 and Processor-N 106). The computer 
server 100 is connected to data stores, including data store 108, which stores input files of 
raw data to be stored into a database, data store 110, which stores partitions of a database, 
and data store 1 12, which stores output files for holding error information and key/RID 

5 information. Under the direction of the optimization system 102, a load process at each 
processor 104, 106 loads data from an input file 108 into a database 110, while storing 
information in the output files 1 12. Although the input file, the database, and the output files 
have been illustrated at different data stores, it is to be understood that they may all be stored 
on one data store or in some combination at multiple data stores {e.g. , with the input and 

1 0 output files at one data store and the database at another data store). 

FIG. 2 is a diagram illustrating a computer hardware environment that adds further 
detail to the hardware environment of FIG. 1 . In the environment, a computer system 202 is 
comprised of one or more processors 232, 234 connected to one or more data storage devices 
204 and 206 that store one or more relational databases in tablespaces. The data storage 

1 5 devices 204 and 206 may comprise a fixed or hard disk drive, a floppy disk drive, a CDROM 
drive, a tape drive, or other device. 

Operators of the computer system 202 use a standard operator interface 208, such as 
IMS/DB/DC®, CICS®, TSO®, OS/390®, ODBC® or other similar interface, to transmit 
electrical signals to and from the computer system 202 that represent commands for 

20 performing various search and retrieval functions, termed queries, against the databases. In 
the present invention, these queries conform to the Structured Query Language (SQL) 
standard, and invoke functions performed by Relational DataBase Management System 
(RDBMS) software. 

The SQL interface has evolved into a standard language for RDBMS software and 
25 has been adopted as such by both the American National Standards Institute (ANSI) and the 
International Standards Organization (ISO). The SQL interface allows users to formulate 
relational operations on the tables either interactively, in batch files, or embedded in host 
languages, such as C and COBOL. SQL allows the user to manipulate the data. 
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In an embodiment of the present invention, the RDBMS software comprises the 
DB2® product offered by IBM for the OS/390® operating systems. Those skilled in the art 
will recognize, however, that the present invention has application to any RDBMS software, 
whether or not the RDBMS software uses SQL. 

5 As illustrated in FIG. 2, the DB2® system for the OS/390® operating system includes 

three major components: the Internal Resource Lock Manager (IRLM) 210, the Systems 
Services module 212, and the Database Services module 214. The IRLM 210 handles 
locking services for the DB2® system, which treats data as a shared resource, thereby 
allowing any number of users to access the same data simultaneously. Thus concurrency 

10 control is required to isolate users and to maintain data integrity. The Systems Services 
module 212 controls the overall DB2® execution environment, including managing log data 
sets 206, gathering statistics, handling startup and shutdown, and providing management 
support. 

At the center of the DB2® system is the Database Services module 214. The Database 
15 Services module 214 contains several submodules, including the Relational Database System 
(RDS) 216, the Data Manager 218, the Buffer Manager 220, and other components 222 such 
as an SQL compiler/interpreter. These submodules support the functions of the SQL 
language, i.e. definition, access control, interpretation, compilation, database retrieval, and 
update of user and system data. 
20 The computer system 202 comprises an optimization system 230 and multiple 

processors (Processor- 1 232 and Processor-N 234). The computer system 202 is also 
connected to a data store 236, which stores input files of raw data to be stored into a database, 
data store 238, which stores a database, and data store 240, which stores output files for 
holding error information and key/RID information. 
25 The present invention is generally implemented using load control statements 

executed under the control of the Database Services module 214. The Database Services 
module 214 retrieves or receives the load control statements, wherein the load control 
statements are generally stored in a text file on the data storage devices 204 and 206 or are 
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interactively entered into the computer system 202 by an operator sitting at a monitor 226 via 
operator interface 208. The Database Services module 214 and the Optimization System 230 
then derive or synthesize instructions from the load control statements for execution by the 
computer system 202. 

5 Generally, the RDBMS software, the load control statements, and the instructions 

derived therefrom, are all tangibly embodied in a computer-readable medium, e.g. one or 
more of the data storage devices 204 and 206. Moreover, the RDBMS software, the load 
control statements, and the instructions derived therefrom, are all comprised of instructions 
which, when read and executed by the computer system 202, causes the computer system 202 

10 to perform the steps necessary to implement and/or use the present invention. Under control 
of an operating system, the RDBMS software, the load control statements, and the 
instructions derived therefrom, may be loaded from the data storage devices 204 and 206 into 
a memory of the computer system 202 for use during actual operations. 

Thus, the present invention may be implemented as a method, apparatus, or article of 

1 5 manufacture using standard programming and/or engineering techniques to produce software, 
firmware, hardware, or any combination thereof. The term "article of manufacture" (or 
alternatively, "computer program product") as used herein is intended to encompass a 
computer program accessible from any computer-readable device, carrier, or media. Of 
course, those skilled in the art will recognize many modifications may be made to this 

20 configuration without departing from the scope of the present invention, 

Those skilled in the art will recognize that the environment illustrated in FIG. 2 is not 
intended to limit the present invention. Indeed, those skilled in the art will recognize that 
other alternative hardware environments may be used without departing from the scope of the 
present invention. 

25 
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Technique for Determining an Optimal Number of Tasks in a 
Parallel Database Loading System with Memory Constraints 
An embodiment of the invention provides an optimized parallel database loading system 
referred to herein as an optimization system 230. In particular, the optimization system 230 
5 optimizes a number of load and sort processes to be invoked simultaneously based on several 
factors, such as processing capabilities, overall and individual process memory constraints, the 
number of partitions and the number of indexes. 

The system shown in FIG. 3, illustrates a database with four partitions 3 1 0, 3 1 2, 3 14, 3 1 6 
and input data in separate files 300, 302, 304, 306. It should be recognized by the skilled artisan 
10 that the number of partitions and separate input files may be varied to accommodate differing 
data structures. Data is loaded into the partitions by parallel load processes, each operating on 
a single partition and single input file at a time. The database is indexed, and as rows of data are 
loaded 308, index keys and row identifiers are extracted 317 and piped to the parallel sort 
processes 3 1 8. The sorted keys and row identifiers are then piped to parallel build processes 320 
15 that build concatenated indexes 322, 324, 326. 

The optimization system 230 optimizes the number of load, sort and build processes to 
be executed in parallel, based on the constraints imposed by the available hardware. For 
example, the number of sort processes is limited by the number of build processes and vice 
versa. Note that one build process is used to build one index. This mutual limitation exists 
20 because fewer number of sort processes results in multiple sort process sorting keys belonging 
to one index. Such an arrangement emits sorted keys sorted first by the index, so there are keys 
piped to multiple build processes. This only allows one of the build processes to be active at a 
time, thereby negating any increased processing times associated with initiating additional build 
processes. 

25 Additionally, build processes cannot start processing keys until all of the keys have been 

sorted (i.e., sort processes are said to be "blocking"). When sorting is finished, the load 
processing will have already finished. This relationship insures that load, sort, and build 
processes are never simultaneously active, and thus for one instant in time, it is the number of 
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load and sort processes that is the limiting factor for the overall parallel database processing 
speed. Given this relationship, optimizing the number of load and sort processes will optimize 
the processing speed. 

Four constraints govern the number of processes to be started: 

1 . The number of sort processes cannot exceed the number of indexes to be 
built 

2. The number of load processes cannot exceed the number of partitions to 
be loaded. 

3^ The sum of the number of sort processes and the number of load 
processes cannot exceed the processing capabilities of the server. 

4. The memory used by the sort and load processes cannot exceed the 
amount of memory available for these processes. 

With the first two constraints, N s is the number of sort processes, N L is the number of 
load processes, Nj is the number of indexes to be built, and N P is the number of partitions to be 
loaded. The relationship of these factors, can be expressed by the following: 

N s <=N, andN L <=N P 

The third constraint can be expressed by the relationship: 

N L + N s <= mP 
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In the above relationship, e.g. , P is the number of processors available on the server, and 
m is a multiplier that is a function of the speed of the processors. Although the loading and 
sorting processes are processor intensive operations, increased CPU speeds permit simultaneous 
handling of multiple processes. 
5 The fourth constraint can be expressed by the relationship: 

aN L + bN s + c < M 

In the above relationship, e.g. , M is the amount of available memory, a is the amount of 
10 memory required for each load process, b is the amount of memory required for each sort 
process, and c is the amount of memory required for a main process that directs the load, sort, 
and build processes. 

FIG. 4 is a flow diagram illustrating the technique performed by the optimization system 
230. The optimization system 230 begins by determining the number of partitions to be loaded 

1 5 400. Next, the number of indexes to be built is determined 410. The optimization system then 
considers the processing capabilities 420 available to it which is a function of the number of 
processors and the speed of the processors. Next, the amount of available memory 430 available 
to implement the optimized technique is determined. With all of the above information, the 
optimization system 230 then determines which of three domains representing a set of 

20 constraints should apply 440. Once the applicable domain is established, the rules for that 
domain are applied 450 to optimize system performance. 

The relationship between the various system constraints is expressed graphically and is 
seen in FIGS. 5A-7B. FIG. 5A illustrates a first domain 500 which represents an arrangement 
of the above enumerated constraints where the number of partitions being loaded is sufficiently 

25 large so as to not limit the number of load processes. Line 1 represents the number of indexes, 
line 2 represents the number of partitions being loaded, line 3 represents the processing 
capabilities of the processors, and line 4 represents the memory constraints of the system. The 
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slope of line 4 is negative because the amount of memory required for sort processes is larger 
than the amount of memory required for load processes (b > a). 

The shaded area in FIG. 5 A, represents the possible combinations of sort processes and 
load processes that are permitted given the above enumerated constraints. The point of 
5 intersection of lines 3 and 4, labeled A, represents the maximum number of sort processes that 
can be started with the number of processors constraint. The points on line 3 between point A 
and a point labeled B represent other combinations of numbers of sort and load processes that 
while maximizing the total number of processes, have fewer sort processes and more load 
processes. Points in the shaded area to the left of and above point A represent combinations of 
1 0 numbers of sort and load processes where the number of sort processes is larger than that at point 
A, but where the total number of processes is not maximized. 

The relationship illustrated in FIG. 5A can be additionally described by the following 
relationship which is illustrated in a flow chart in FIG. 5B: 

15 amP + c<M< bmP + c (See Block 510) 

To achieve optimal performance for the system illustrated in FIG. 5 A, the number of sort 
and load processes should be equal because their respective burdens on the processors are 
approximately equal. The additional constraints and the need to maximize the total number of 
20 processes places the optimal point at point A. 

The number of sort and load processes at point A can be calculated by the following 
technique: 



25 



1. 



IfN!<(M-c-amP)/(b-a) 

then N s = N h and N L = min [mP - N b N P ]; or 



(520) 
(530) 
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2. If N P < (bmP + c - M) / (b - a) (540) 
then N L = N P , and N s = min [(M - c - aN P ) / b, NJ; or (550) 



3. ElseN s = (M-c-amP)/(b.a),andN L = mP-N s . (560) 

5 

If the first domain applies and the number of indexes is less than the difference of the 
total amount of available memory, the amount of memory required for a main process, and the 
amount of memory required for each load process multiplied by the processing capabilities, 
divided by the difference of the amount of memory required for each sort process and each load 

1 0 process 520, then the optimal point is reached when the number of sort processes is equal to the 
number of indexes, with the number of load processes equal to the smaller of the difference of 
the processing capabilities and the number of indexes, and the number of partitions. 

If the first domain applies and the above condition does not apply, and the number of 
partitions is less than the difference of the sum of the amount of memory required for each sort 

1 5 process multiplied by the processing capabilities, the total amount of memory required for a main 
process, and the amount of memory required for each load process, divided by the difference of 
the amount of memory required for each sort process and each load process 540, then the 
optimal point is reached when the number of load processes is equal to the number of partitions 
and the number of sort processes is equal to the smaller of the difference of the total amount of 

20 available memory, the amount of memory required for the main process, and the amount of 
memory for each load process multiplied by the number of partitions, divided by the memory 
required for each sort process, and the number of indexes. 

If the first domain applies and neither of the above two conditions applies, then the 
optimal point is reached when number of sort processes is equal to the difference of the total 

25 amount of available memory, the amount of memory required for a main process, and the amount 
of memory required for each load process, divided by the difference of the amount of memory 
required for each sort process and each load process, and when the number of load processes is 
equal to the difference of the processing capabilities and the number of sort processes 560. 
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FIG. 6 A illustrates a second domain 600 in which there are no memory constraints. Here, 
the numbered lines represent the same values and the optimal mix of processes is achieved by 
attempting to equalize N L and N s . In this example, there is sufficient memory so that there is no 
memory-related constraint on the number of processes and the shaded area is bounded by the 
constraints for the number of indexes, the processing capabilities, and the number of partitions 
being loaded. This can be expressed by the following relationship which is illustrated in a flow 
chart in FIG. 6B: 

M>bmP + c (610) 
The optimization system 230 determines the optimal point by the following technique: 

1. IfN,<mP/2, (620) 
then N s = N„ and N L = min [mP - N s , N P ]; or (630) 

2. IfN P <mP/2, (640) 
then N L = N P , and N s = min [ mP - N L , NJ; or (650) 

3. ElseN L = N s = mP/2. (660) 



If the second domain applies and the number of indexes is less than half of the processing 
capabilities 620, then the optimal point is reached when the number of sort processes is equal 
to the number of indexes and the number of load processes is equal to the smaller of the 
difference of the processing capabilities available for the load processes and the number of sort 
25 processes, and the number of partitions. 

If the second domain applies and the number of partitions is less than half of the 
processing capabilities 640, then the optimal point is reached when the number of load processes 
is equal to the number of partitions and the number of sort processes is equal to the smaller of 
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the difference of the processing capabilities available for the sort processes and the number of 

load processes, and the number of indexes. 

Otherwise, if neither of the two above conditions are present for the second domain, then 

the optimal point is when the number of load processes is equal to the number of sort processes 
5 and which is equal to half of the processing capabilities 660. In one embodiment, the processing 

requirements for loading and sorting are equal, but it should be recognized by one of ordinary 

skill in the art that the respective processing requirements for loading and sorting may be 

different depending on the system utilized, thereby shifting the optimal point to a point where 

the number of load processes and sort processes are not equal. 
10 In a third domain 700 illustrated in FIG. 7A, there is insufficient available memory to 

start mP total processes and so the shaded area is bound solely by the memory constraints. This 

relationship is illustrated in FIG. 7B: 



15 



M<amP + c (710) 

In this case, the optimization system 230 determines the optimal point by the following 
technique: 



1. IfN,<(M-c)/(a + b) (720) 
20 then N s = N b and N L = min [(M - c - bN^ / a, N P ]; or (730) 

2. IfN P <(M-c)/(a + b) (740) 
then N L - N P , and N s - min \M - c- aN P ) / b, NJ; or (750) 



25 3. EIseN L = N s = (M-c)/(a + b). (760) 

If it is determined that the third domain applies and if the number of indexes is less than 
the difference of the total amount of available memory and the amount of memory required for 
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a main process, divided by the amount of memory required for each load and sort process 720, 
then the optimal point is reached when the number of sort processes is equal to the number of 
indexes with the number of load processes equal to the smaller of the difference of a total amount 
of available memory and the amount of memory required for the main process, and the amount 

5 of memory for each sort process multiplied by the number of indexes, divided by the memory 
required for each load process, and the number of partitions. 

If the third domain applies and the number of partitions is less than the difference of the 
total amount of available memory and the amount of memory required for a main process, 
divided by the amount of memory required for each load and sort process 740, then the optimal 

1 0 point is reached when the number of load processes is equal to the number of partitions with the 
number of sort processes equal to the smaller of the difference of the total amount of available 
memory, the amount of memory required for the main process, and the amount of memory for 
each load process multiplied by the number of partitions, divided by the memory required for 
each sort process, and the number of indexes. 

1 5 If the third domain applies, and neither of the two conditions described above are present, 

then the optimal point is reached when the number of load processes is equal to the number of 
sort processes which is equal to the difference of the total amount of memory available and the 
amount of memory required for a main process, divided by the amount of memory required for 
each load and sort process 230. In one embodiment, the processing requirements for loading and 

20 sorting are equal, but it should be recognized by one of ordinary skill in the art that the respective 
processing requirements for loading and sorting may be different depending on the system 
utilized, thereby shifting the optimal point to a point where the number of load processes and sort 
processes are not equal. 



25 CONCLUSION 

This concludes the description of an embodiment of the invention. The following 
describes some alternative embodiments for accomplishing the present invention. For example, 
any type of computer, such as a mainframe, minicomputer, or personal computer, or computer 
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configuration, such as a timesharing mainframe, local area network, or standalone personal 
computer, could be used with the present invention. 

The foregoing description of an embodiment of the invention has been presented for the 
purposes of illustration and description. It is not intended to be exhaustive or to limit the 
5 invention to the precise form disclosed. Many modifications and variations are possible in light 
of the above teaching. It is intended that the scope of the invention be limited not by this 
detailed description, but rather by the claims appended hereto. 
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